Identification of microbial agents in tissue specimens of ocular and periocular sarcoidosis using a metagenomics approach

Background: Metagenomic sequencing has the potential to identify a wide range of pathogens in human tissue samples. Sarcoidosis is a complex disorder whose etiology remains unknown and for which a variety of infectious causes have been hypothesized. We sought to conduct metagenomic sequencing on cases of ocular and periocular sarcoidosis, none of them with previously identified infectious causes. Methods: Archival tissue specimens of 16 subjects with biopsies of ocular and periocular tissues that were positive for non-caseating granulomas were used as cases. Four archival tissue specimens that did not demonstrate non-caseating granulomas were also included as controls. Genomic DNA was extracted from tissue sections. DNA libraries were generated from the extracted genomic DNA and the libraries underwent next-generation sequencing. Results: We generated between 4.8 and 20.7 million reads for each of the 16 cases plus four control samples. For eight of the cases, we identified microbial pathogens that were present well above the background, with one potential pathogen identified for seven of the cases and two possible pathogens for one of the cases. Five of the eight cases were associated with bacteria ( Campylobacter concisus, Neisseria elongata, Streptococcus salivarius, Pseudopropionibacterium propionicum, and Paracoccus yeei), two cases with fungi ( Exophiala oligosperma, Lomentospora prolificans and Aspergillus versicolor) and one case with a virus (Mupapillomavirus 1). Interestingly, four of the five bacterial species are also part of the human oral microbiome. Conclusions: Using a metagenomic sequencing we identified possible infectious causes in half of the ocular and periocular sarcoidosis cases analyzed. Our findings support the proposition that sarcoidosis could be an etiologically heterogenous disease. Because these are previously banked samples, direct follow-up in the respective patients is impossible, but these results suggest that sequencing may be a valuable tool in better understanding the etiopathogenesis of sarcoidosis and in diagnosing and treating this disease.


Introduction
Sarcoidosis is a systemic inflammatory disease characterized by the formation of non-caseating granulomas in the affected tissues 1 . Although sarcoidosis can affect almost any organ in the human body, it most commonly affects the lungs, the skin and the ocular and periocular tissues. The frequency of ocular and periocular involvement in patients with sarcoidosis ranges between 25% and 60% depending on the particular population studied 2 . The manifestations of ocular and periocular sarcoidosis include uveitis, conjunctival granulomas, eyelid granulomas, orbital inflammation, dacryoadenitis, dacryocystitis, scleritis and optic neuropathy. Ocular and periocular sarcoidosis accounts for about 5% of patients seen in a uveitis practice and it results in blindness in at least one eye in approximately 10% of the affected patients 2 .
In spite of numerous investigations that have been carried out since the first case of sarcoidosis was reported in 1877 by Jonathan Hutchinson 3 , the etiology of sarcoidosis remains unknown. However, there is very strong evidence that supports the assertion that the pathogenesis of sarcoid granulomas involves an oligoclonal CD4 T cell-mediated immune response to a persistent antigen, most likely an exogenous antigen derived from microbial or inanimate sources 4-6 . Microbial sources of antigens that have been suspected of causing sarcoidosis include bacteria such as mycobacteria, Propionibacterium acnes, Tropheryma whipplei and Borrelia burgdorferi, fungi such as Coccidioides spp., and viruses such as Epstein-Barr virus, cytomegalovirus and hepatitis C virus 4-10 . It is possible that different antigens could be involved in different patients, resulting in a diverse pattern of organ involvement, natural history and clinical course. In addition to exposure to the requisite antigen, it is believed that the disease occurs only within the appropriate genetic background of the host 5 .
The availability of next-generation sequencing (NGS) technologies has opened vast opportunities for pathogen discovery in human disease 11 . We hypothesized that metagenomic sequencing using NGS would identify pathogen-derived microbial DNA within sarcoid granulomas. We conducted a metagenomics analysis on DNA extracted from archival tissue specimens of 16 cases of ocular and periocular sarcoidosis, and we detected possible microbial pathogens in eight of the cases. We anticipate that the identification of potential microbial etiologies of sarcoidosis may lead to large-scale metagenomics studies that can be validated by pathogen isolation followed by investigations to establish the pathogenic role of the suspected microorganisms in the causation of sarcoidosis.

Ethics statement
The Johns Hopkins University School of Medicine Institutional Review Board (IRB) approved this study (approval number IRB00126932), which was undertaken in accordance with the principles of the Declaration of Helsinki and in compliance with the Health Insurance Portability and Accountability Act. The study was categorized by the IRB as 'Not Human Subjects Research' and as such obtaining informed consent was not required.
Collection of demographic, clinical and histopathological data Demographic, clinical and histopathological data (from initial presentation to subsequent follow-ups) were retrospectively collected for each study subject by reviewing the electronic medical records of the subjects. For each subject, the data (including sex, age, race, diagnosis, and results of microbiological, histopathological and radiologic tests) were gathered in a de-identified manner before analysis was carried out.
Human tissue specimens Paraffin-embedded archival tissue specimens of subjects who had biopsies of ocular and periocular tissues at the Wilmer Eye Institute of Johns Hopkins Hospital during the period between February 2010 and February 2017, and that were positive for non-caseating granulomas, were included in the study. A total of 18 such specimens were identified: six specimens of orbital tissues, two specimens of eyelid tissues, two specimens of the lacrimal sac, five specimens of the conjunctiva, one specimen of the cornea and two specimens of the globe. In addition, a total of four archival tissue specimens (one from the conjunctiva and three from the lacrimal gland) that did not demonstrate non-caseating granulomas were included to be used as controls.
For each archival tissue specimen, 10 sections, each 10 µm thick, were cut from the paraffin blocks and used for DNA extraction. Two of the conjunctival specimens that were positive for non-caseating granulomas were excluded from the study due to poor quality of the DNA extracted from the specimens. Therefore, a total 16 positive specimens and four negative specimens were used.

DNA extraction
Genomic DNA was extracted from paraffin-embedded tissue sections using the QIAamp DNA FFPE tissue kit and deparaffinization solution according to the manufacturer's (Catalog numbers 56404 and 19093, respectively, Qiagen, Valencia, CA, USA) recommended and supplementary protocols. Quality of DNA was assessed by Genomic ScreenTape analysis on a TapeStation 2200 (Agilent Technologies, Santa Clara, CA, USA). The Quant-iT PicoGreen dsDNA reagent kit (Catalog number P7589, Invitrogen/ThermoFisher Scientific, Waltham, MA, USA) was used for quantitation of DNA samples, with fluorescent reads performed on a SpectraMax M2 plate reader (Molecular Devices, San Jose, CA, USA).

Generation of DNA library
Libraries were prepared from ten nanograms of DNA using the Ovation Ultralow V2 DNA-seq Library Preparation kit (Catalog number 0344, Tecan Genomics, Redwood City, CA, USA). The recommended protocol was followed with the exception of the initial fragmentation step. Fragmentation was performed enzymatically, instead of ultrasonically, using Celero fragmentation buffer and Celero fragmentation enzyme from the Celero PCR Workflow with Enzymatic Fragmentation kit (Catalog number 9363, Tecan Genomics). Fragmentation time was optimized to 10 minutes, and a modified purification was performed with AMPure XP beads (Catalog number A63881, Beckman Coulter, Brea, CA, USA). Library amplification was performed for 13 cycles based on the Manufacturer's recommendation for starting input amount of DNA (10 ng), in an Applied Biosystems GeneAmp 9700 or Veriti thermal cycler (ThermoFisher Scientific). Cycling parameters were: 72°C for two minutes, 95°C for three minutes, (98°C 20 sec, 65°C 30 sec, 72°C 30 sec) for 13 cycles, 72°C for one minute, and a 4°C hold. Amplification primers and enzyme were part of the Ovation Ultralow V2 kit. Quality of purified libraries was assessed by D1000 ScreenTape analysis on a TapeStation 2200, with region analysis performed for sizing. Quantitation of libraries was performed by qPCR with the Kapa Library Quantitation kit for Illumina (Catalog number KK4824/07960140001, Roche, Basel, Switzerland) in an Applied Biosystems StepOne Plus Real Time PCR system (ThermoFisher Scientific). A six-point standard curve, with a concentration range of 20 pM to 0.0002 pM was run, as per the Kapa recommended protocol. Run parameters were an initial denaturation at 95°C for 5 minutes and 35 cycles (95°C 30 sec denaturation and 60°C 45 sec annealing/extension/ data acquisition), followed by a ramp from 65°C to 95°C for melt curve analysis. qPCR results and sizing data were imported to the Kapa Library Quantitation Data Template for calculations of library concentrations and yields. Libraries were diluted to 10 nM, and an equimolar pool prepared. A final quality assessment of the library pool was performed by High Sensitivity DNA Lab Chip Analysis on a BioAnalyzer 2100 (Agilent Technologies), and a final quantity check was performed on a Qubit Flex Fluorometer using Qubit High Sensitivity DNA reagents and standards (Catalog number Q32854, Invitrogen/ThermoFisher Scientific).

Next-generation sequencing
Sequencing of the library pool was performed with a 300 cycle (2x150 bp) SP run on an Illumina NovaSeq6000 sequencer (Illumina, San Diego, CA, USA) at Johns Hopkins Genomics, Genetic Resources Core Facility, RRID:SCR_018669.

Analysis of metagenomics data
For each of the 20 metagenomics samples, we first removed all human sequences by aligning all paired reads to the GRCh38 human reference genome using Bowtie2 12 in very-sensitive mode. To ensure removal of all human sequences, we removed an entire read pair if either of the read mates aligned to the human reference.
For each patient, we generated two runs of 150-bp paired-end sequencing data. For simplicity, we concatenated the reads by merging the two runs from each patient. We then compared all patient samples against a KrakenUniq 13 database consisting of 5,981 bacterial species (18,484 genomes), 295 archaeal species (374 genomes), 9,905 viral species (10,012 genomes), 250 eukaryotic pathogen (e.g. fungi, amoebas) species (388 genomes), the human GRCh38.p13 genome, and vector sequences. The total numbers of reads per sample, along with the numbers identified as microbial, are shown in Table 1.
KrakenUniq 13 classifies each read by breaking reads into overlapping k-mers, searching the database for the lowest common ancestor of each k-mer, and then assigning the overall read a taxon based on the k-mer taxon distribution. Unlike Kraken 1 14 and Kraken 2 15 , KrakenUniq reports for every taxonomic classification -not only the read counts but also the number of distinct k-mers, giving extra confidence in classification. Hits with a low count of distinct k-mers are often false positives; e.g., due to low-complexity repetitive sequences in the genome of a pathogen.
In order to detect outlier read counts among the metagenomics samples, we used a modified Z-score calculation as defined by Iglewicz and Hoaglin 16 . As compared to a normal Z-score calculation which uses mean values that may be influenced by extreme outliers, this formula uses the median deviation and the sample median. The formula for the modified Z-score for sample i is as follows: where X_median is the median read count across all samples and MAD is the median absolute deviation. The median absolute deviation (MAD) is defined as the median of the absolute difference of the observation from the sample median: Reads from species with a significant modified Z-score and a high distinct k-mer count were then extracted and aligned to the NCBI nucleotide database to verify whether they were true positives or whether they hit other species equally well or better, suggesting a false positive match.
Analysis of candidate pathogen reads found in control samples For 7/9 candidate infectious microbes, we found small numbers of reads, ranging from 1-64, in one or more control samples. For 8/9 of these pathogens, we found small numbers of reads in other non-control samples. In order to clarify why these reads were present, we analyzed them to determine if they were either (a) computational false positives or (b) possible crosscontamination in the multiplexed sequencing experiment. In addition to counting reads, KrakenUniq counts the number of unique k-mers (k=31 in our experiments) found in each species in a sample 13 . Each 150-bp read may contain as many as 130 unique 31-mers, if the hit is a true positive and if each k-mer is distinct. For all of the candidate infectious agents, the number of unique k-mers per read was quite high, ranging from 50 to >100. If the unique kmer count for a read is low, the read may consist of low-complexity, repetitive sequence, suggesting that the match is a computational false positive. To check for this possibility, from each of the control samples that had reads matching a candidate infectious agent, we aligned those reads using BLAST 17 against NCBI's comprehensive "nr" nucleotide database. If the reads hit the genome of the candidate pathogen, that suggested cross-contamination in the sample. If the reads matched other genomes or did not match the genome of interest, that suggested they were false positives.
This evaluation found that small levels of cross-contamination explained the control sample matches for seven of the eight candidate pathogens identified in Table 3, as follows.
(1) Kraken identified 0-4 reads as Campylobacter concisus in the control samples, and BLAST alignments confirmed that they matched C. concisus, suggesting a small amount of cross-contamination.
(2) For Neisseria elongata, Kraken found 1-14 reads in the control samples, and all were confirmed by BLAST.
(3) For Exophiala oligosperma, we found 1-2 reads in the controls and all were confirmed by BLAST.
(4) For Streptococcus salivarius, we found 3-33 reads in the control samples, and we confirmed a random sample of them using BLAST. (5) We found 2-13 reads matching Pseudopropionibacterium propionicum in the control samples, and all were confirmed by BLAST. (6) We found 1-8 reads in the control samples matching Aspergillus versicolor and confirmed a random sample of them by BLAST. (7) We found 2-64 reads matching Paracoccus yeei in the control samples and all were confirmed by BLAST. (8) For Lomentospora prolificans, we found 0 reads in the control samples; however, Kraken identified 1-66 reads in the non-control samples. We searched a sample of these reads against "nr" using BLAST, and all aligned to different species while none had BLAST alignments to L. prolificans. Upon further inspection, all the reads had a very low number of unique k-mers. Thus, we determined that these reads were low complexity, repetitive sequences that yielded false positive matches.

Demographic and clinical data
The demographic and clinical data of the patients (16 cases and 4 controls) whose archival tissue specimens were used in the study are presented in Table 2. The cases ranged in age from 32 to 79 years while the controls ranged in age from 38 to 71 years. Among the cases, 13 were female and three were male, while among the controls three were female and one male. Seven of the cases were diagnosed to have systemic sarcoidosis while none of the controls were reported to have systemic sarcoidosis.

Metagenomics analysis
We identified pathogens that were possibly associated with disease in eight of the 16 case samples (Table 3). For seven of the samples, a possible pathogen species was present at a much higher level than in any of the controls or the other clinical samples, and for one sample (sample 115), two possible pathogens were identified. For each of the eight samples and nine pathogens, the read counts for the pathogen were statistically higher than expected based on the distribution of read counts in all other samples. We measured this expectation using a modified z-score, which represents the number of standard deviations above the mean for the read count from the possible pathogen (see Methods). Below we briefly discuss each of the eight samples in which possible infectious agents were detected.

Sample 101. Sample 101 contained 179 read pairs from
Campylobacter concisus, while no other sample had more than seven read pairs, which could be cross-contamination from the multiplexed sequencing run. The controls had 0-4 reads ( Table 2). This is a highly significant finding, with a modified z-score of 119.

Histopathological data
Histopathological examination carried out as part of routine medical care of all the cases showed typical non-caseating granulomas. Representative histopathological images from three of the eight cases that were positive for microbial DNA are presented in Figure 1. Except for specimen 115, the seven other cases were negative on acid-fast and fungal stains at the time of initial histopathological evaluation. Specimen 115 did not undergo staining for acid-fast and fungi at the time of initial histopathological examination of the specimen (which was the same specimen used in our study) that was obtained from the patient during a corneal transplant procedure. However, this case underwent another corneal transplant procedure eight months after the initial transplant and the specimen obtained at the time, while still showing non-caseating granulomas, was negative on acid-fast and fungal stains.

Discussion
In this study we conducted a metagenomics analysis of DNA extracted from archival tissue specimens that were obtained from 16 cases with ocular or periocular sarcoidosis and identified DNA evidence of a possible microbial pathogen in eight of the cases. The microbial agents identified from the tissue specimens were five species of bacteria (Campylobacter concisus, Neisseria elongata, Streptococcus salivarius, Pseudopropionibacterium propionicum, and Paracoccus yeei), three species of fungi (Exophiala oligosperma, Lomentospora prolificans and Aspergillus versicolor) and one species of virus (Mupapillomavirus 1).
The case that was positive for Campylobacter concisus DNA had orbital and pulmonary sarcoidosis. C. concisus is a Gram-negative bacterium that colonizes the oral cavity of humans 20,21 . Currently, humans are the only known hosts of this bacterium 20,21 . A few studies have found an association between C. concisus and Barrett's esophagus 22,23 . In addition, recent studies have also demonstrated association between Crohn's disease and C. concisus, which could translocate from the oral cavity to the intestine 24,25 . It is plausible that C. concisus could be aspirated from the oral cavity to the lungs, and then also potentially to distant organs such as the orbit, where it could incite an inflammatory process. The case that was positive for Neisseria elongata DNA had orbital sarcoidosis with no systemic sarcoidosis reported. N. elongata is a Gram-negative bacterium that is part of the normal flora of the oral cavity 26 . There are a number of case reports of infective endocarditis associated with colonization by N. elongata [26][27][28][29] . In addition, the bacterium has been implicated in some cases of osteomyelitis 29,30 .
The case in which Streptococcus salivarius DNA was detected had conjunctival sarcoidosis without reported evidence of systemic sarcoidosis. S. salivarius is a Gram-positive bacterium which is part of the normal flora of the oral cavity 31 . It establishes itself in the human oral cavity within a few hours after birth and persists as a predominant inhabitant of the oral cavity 32 . The bacterium has been associated with invasive infections including meningitis 31 , bacteremia 33 and prosthetic joint infection 34 . Interestingly, S. salivarius has also been associated with exogenous endophthalmitis following keratoplasty with a contaminated donor cornea 35 and after an intravitreal injection 36 .
Pseudopropionibacterium propionicum (formerly known as Propionibacterium propionicum, Arachnia propionica and Actinomyces propionicus) DNA was detected in a case that had conjunctival sarcoidosis without reported systemic sarcoidosis. P. propionicum is a Gram-positive bacterium that is part of the human oral flora 37 . It has been associated with human infectious diseases that resemble actinomycosis. There are case reports of the bacterium being associated with lacrimal canaliculitis, cervicofacial infections 38,39 , tympanomastoiditis 40 , pulmonary and thoracic infections 19,41,42 , osteomyelitis 43 and brain abscess 44 . Infection by P. propionicum causes chronic granulomatous inflammation characterized by abscesses, draining sinuses and fibrosis 19,45 .
Paracoccus yeei DNA was detected in a patient who had sarcoidosis that involved the iris, ciliary body, choroid and retina; this case did have a reported evidence of cutaneous sarcoidosis. P. yeei is a Gram-negative bacterium that is found naturally in soil and brine 46 . In a study involving 1321 patients with idiopathic uveitis, Drancourt et al. detected P. yeei in one patient by conducting 16S rDNA sequencing on an intraocular fluid specimen 47 . In another, study P. yeei was cultured from the aqueous humor of a patient who had developed corneal graft rejection 48 . In addition, P. yeei has been associated with peritonitis in a patient undergoing peritoneal dialysis 49 and with cutaneous infection followed by bacteremia in a patient with heart failure 50 .
The case in which Exophiala oligosperma DNA was detected had conjunctival sarcoidosis with no reported systemic sarcoidosis. E. oligosperma is a dimorphic fungus that has been associated with cutaneous and subcutaneous lesions 18 and olecranon bursitis 51 . Exophilia species have been isolated from the skin, cutaneous tissues, the heart, the lungs, bone and the central nervous system [51][52][53][54] . Interestingly, a member of the genus Exophiala (E. jeanselmei) has been associated with keratitis 55 and another member (E. dermatitidis) with endophthalmitis 56 .
The case in which DNA belonging to each of Lomentospora prolificans and Aspergillus versicolor was simultaneously detected had corneal sarcoidosis with reported pulmonary sarcoidosis; in addition, the case had sarcoidosis-associated panuveitis of the affected eye. L. prolificans is an anamorphic fungus that has been associated with localized bone and joint infections in the immunocompetent host and with disseminated disease (involving the lungs, the ears, the eyes and the central nervous system) in the immunocompromised host 57,58 . A. versicolor is a filamentous fungus. It has been associated with invasive pulmonary aspergillosis 59 , onychomycosis 60 and endogenous endophthalmitis 61 .
The case that was positive for Mupapillomavirus 1 had orbital sarcoidosis that involved the extraocular muscle tissues with no systemic sarcoidosis reported. Mupapillomavirus 1 is a double-stranded DNA virus that belongs to the virus family Papillomaviridae. It has been isolated from plantar warts 62 and from punctate keratotic lesions of the foot 63 . Interestingly, the virus has also been detected, using a PCR method, in the lesions of cutaneous sarcoidosis in a patient who also had pulmonary sarcoidosis 64 . In addition, other human papillomaviruses have been associated with ocular diseases, including pterygium and ocular surface squamous neoplasia 65 .
In this study, we have identified nine different microorganisms in eight cases of ocular and periocular sarcoidosis. It is not known at this time if any of these microorganisms play any role in the causation of sarcoidosis. The microbial agents could gain access to the ocular and periocular tissues directly from the environment (especially after trauma or surgery) or could reach this tissues via hematogenous spread after initial colonization of distant tissues such as the lungs, the skin and the subcutaneous tissues. It is interesting to note that four of the five bacterial species that were identified by our study are also part of the human oral microbiome. In those cases, the oral cavity could be the source of the microorganisms that involved the ocular and periocular tissues.
One perplexing finding of our study is that none of the nine microorganisms were detected in more than one case. A possible explanation for this observation is that sarcoidosis is an etiologically heterogenous disease. In support of this argument, it is important to note that sarcoidosis, in addition to being associated with a number of microbial agents, has also been linked to a number of inanimate sources of antigens, including tattoo ink, aluminum, zirconium, talc, and insecticides 5,6 .
Another limitation of our study is that potential RNA viruses could not be detected due to the nature of the assay. The relatively small sample size and the fact that paraffin-embedded archival tissue specimens were used are also additional shortcomings. Future studies using a metagenomics approach on a much larger sample size and employing fresh tissue specimens from a variety of sources are recommended.

Conclusions
In this study, using a metagenomics approach, we identified nine potential microbial agents in tissue specimens of eight cases of ocular and periocular sarcoidosis. The role of these microorganisms in the causation of sarcoidosis is not clear at this time. Our study has limitations due to the relatively small sample size and due to the fact that metagenomics analysis was carried out on archival tissue specimens. Large-scale metagenomics studies using fresh tissue specimens are needed to provide a better understanding of the potential role of microbial agents in the causation of sarcoidosis. The results of such studies could lead to improved means for the diagnosis and treatment of sarcoidosis.
suggestion that infection may be causal in sarcoidosis. If this is the author's position, it should be cited with supporting literature. Otherwise, minor revision to discuss the uncertainty that above cases might represent actual infection vs an immunologic response to a prior infection (i.e. sarcoidosis) is warranted.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes